Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor serializer: rearrange code for clarity and introspection #1148

Merged
merged 12 commits into from
Jun 15, 2017

Conversation

dmsnell
Copy link
Member

@dmsnell dmsnell commented Jun 12, 2017

@see #948 for background exploration

These changes are intended to make the flow of transformation from
in-memory block data to serialized post_content clearer. Additionally
they are intended to ease testing and create points for easy trapping
and transformation of the data through the pipeline.

As I'm working in the serializer file to make harmonious updates with the parser changes I'm finding it unclear what is and what should be going on in serialization. Hopefully this PR will serve as a base for future work.

It makes subjective calls on style; I know. Please let me know if you don't like it. Performance shouldn't be an issue and if you think it will please respond with data backing that up. Unlike in #948 I tried to give a reasonable 👁 to performance in this incarnation and I don't anticipate it being reasonably less-performant than the current implementation in production.

My plans are to continue to augment this (and update the tests) so that we can have more control over the serialization process. This additional control might give us more ability to automatically detect if posts have been altered out of Gutenberg and if so, if they have been altered in a deleterious manner.

@dmsnell dmsnell added [Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] In Progress Tracking issues with work in progress labels Jun 12, 2017
@dmsnell dmsnell requested review from youknowriad, mtias and aduth June 12, 2017 20:55
@nylen
Copy link
Member

nylen commented Jun 12, 2017

This additional control might give us more ability to automatically detect if posts have been altered out of Gutenberg

As discussed elsewhere (#844 for example), this is pretty worrisome. We've chosen a data storage mechanism (post_content) which in software changes is ancient; it's our responsibility to recognize that it has other uses besides our project.

and if so, if they have been altered in a deleterious manner.

This is more like it. We should be validating inputs, not detecting changes.

Object.keys( realAttributes ),
Object.keys( expectedAttributes )
);
export function attributesToSave( allAttributes, fromContent ) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think getCommentAttributes was a better name. I would say the signature of this function should be:

function getCommentAttributes( allAttributes, attributesFromContent ) {

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's fine with me. I found it confusing (which is why I changed it) but I don't care too much


return memo + `${ key }="${ value }" `;
}, '' );
const transform = key => escapeDoubleQuotes( allAttributes[ key ] );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Transforming should not really be the responsibility of the function that goes from all attributes to attributes saved in comments. This should be done during the serialization-to-string step.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fair point. the reason I did this was to avoid iterating twice. granted, that's an optimization

it did however seem reasonable to do here since this function was taking "attributes from a block" and returning "attributes that need to be saved in the comment header." if we don't transform here, we have a function which returns a set of attributes in a state that should never exist: "the group of attributes which need to serialize in the comment header but which can't be saved as-is because of potential serialization bugs"

thoughts?

// Iterate over attributes and produce the set to save
return reduce(
Object.keys( allAttributes ),
( toSave, key ) => Object.assign( toSave, isValid( key ) && { [ key ]: transform( key ) } ),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit too magical. What does Object.assign( toSave, false ) actually do?

Combined with the above comment, how about filter( allAttributes, ( value, key ) => isValid( key ) ) instead?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, do we really need isValid at all? Isn't it enough to say simply !! attributesFromContent[ key ] ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This feels a bit too magical.

I can see where it feels magical, but it's just the behavior of Object.assign() by specification.

how about filter

this seems very reasonable. I'll update that.

Also, do we really need isValid at all? Isn't it enough to say simply !! attributesFromContent[ key ] ?

well if that's all we said we'd be wrong in the case where allAttributes[ key ] is undefined and it doesn't exist in attributesFromContent. trying to pull out the actual operations that are happening to determine if a key is valid is part of why I created isValid() since before it was spread across different lines.

@dmsnell
Copy link
Member Author

dmsnell commented Jun 13, 2017

This is more like it. We should be validating inputs, not detecting changes.

@nylen this is the whole point. the goal isn't to slap people on the write for using an external editor, but it's to determine if any external edits caused any change in the structural content of the block and if so, present the user with some indication that decay has occurred.

@dmsnell dmsnell force-pushed the refactor/serializer branch from b4317d1 to 1ad4c77 Compare June 13, 2017 19:54
{ return keyValue( name, value ) }
/ name:HTML_Attribute_Name _* "=" _* "'" value:$(("\\'" . / !"'" .)*) "'"
/ name:HTML_Attribute_Name _* "=" _* "'" value:$(("\\" "'" . / !"'" .)*) "'"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More on these changes later…

return memo + `${ key }="${ value }" `;
}, '' );
return ! ( contentValue !== undefined || allValue === undefined )
? Object.assign( toSave, { [ key ]: escapeDoubleQuotes( allValue ) } )
Copy link
Contributor

@youknowriad youknowriad Jun 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It feels a bit weird to me that we're encoding the attributes values in this function but we generate the key="value" in the other function. Serialization of the comment attributes is split between two methods.

I'd think we should avoid encoding here, or generate the complete string here because encoding the values without generating a string makes no sense to me.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rearranged in fdd4aa72

@dmsnell dmsnell force-pushed the refactor/serializer branch 2 times, most recently from eb87c72 to a0a8351 Compare June 14, 2017 23:30
dmsnell added 8 commits June 15, 2017 13:14
These changes are intended to make the flow of transformation from
in-memory block data to serialized `post_content` clearer. Additionally
they are intended to ease testing and create points for easy trapping
and transformation of the data through the pipeline.
@dmsnell dmsnell force-pushed the refactor/serializer branch from 01d9635 to f97ee6a Compare June 15, 2017 11:14
@@ -1,8 +1,14 @@
{

function untransformValue( value ) {
return 'string' === typeof value
? value.replace( /\\-/g, '-' )
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right? What if you have the string "\\\\-"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this needs to pair with the serializer to make sure we don't get here. we have have to face escaping in some manner. doing it here I think is the easiest way to accomplish this: "You must escape hyphens. You need not escape anything else."

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, see #1088 for background on the issues with a double-hyphen in an HTML comment

Copy link
Contributor

@youknowriad youknowriad Jun 15, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this means we should escape \ as well?

@@ -36,35 +36,94 @@ export function getSaveContent( save, attributes ) {
return wp.element.renderToString( rawContent );
}

const escapeDoubleQuotes = value => value.replace( /"/g, '\"' );
const replaceHyphens = value => value.replace( /-/g, '\\-' );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it should be escapeHyphens for consistency?

* @returns {*} transformed value
*/
const serializeValue = value =>
'string' === typeof value
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is an array or object that contains strings?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

they shouldn't be. we should already be JSON-serializing data if not a number or string.

this will change if we stop using name="value" pairings, which is being worked on in another PR at the moment (not yet published)

Copy link
Contributor

@youknowriad youknowriad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@dmsnell dmsnell merged commit b2c83a7 into master Jun 15, 2017
@dmsnell dmsnell deleted the refactor/serializer branch June 15, 2017 13:16
@@ -127,7 +127,7 @@ describe( 'full post content fixture', () => {
}
}

expect( serializedActual ).to.eql( serializedExpected );
expect( serializedActual.trim() ).to.eql( serializedExpected.trim() );
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want (or need) to be forgiving with leading/trailing whitespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion we shouldn't be testing for the specific presence of absence of it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Feature] Parsing Related to efforts to improving the parsing of a string of data and converting it into a different f [Status] In Progress Tracking issues with work in progress
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants